Healthy lifespan inequality: morbidity compression from a global perspective

Current measures of population health lack indicators capturing the variability in age-at-morbidity onset, an important marker to assess the timing patterns of individuals’ health deterioration and evaluate the compression of morbidity. We provide global, regional, and national estimates of the variability in morbidity onset from 1990 to 2019 using indicators of healthy lifespan inequality (HLI). Using data from the Global Burden of Disease Study 2019, we reconstruct age-at-death distributions to calculate lifespan inequality (LI), and age-at-morbidity onset distributions to calculate HLI. We measure LI and HLI with the standard deviation. Between 1990 and 2019, global HLI decreased from 24.74 years to 21.92, and has been decreasing in all regions except in high-income countries, where it has remained stable. Countries with high HLI are more present in sub-Saharan Africa and south Asia, whereas low HLI values are predominant in high-income countries and central and eastern Europe. HLI tends to be higher for females than for males, and HLI tends to be higher than LI. Globally, between 1990 and 2019 HLI at age 65 increased from 6.83 years to 7.44 for females, and from 6.23 to 6.96 for males. Improvements in longevity are not necessarily accompanied by further reductions in HLI among longevity vanguard countries. Morbidity is compressing, except in high-income countries, where it stagnates. The variability in the ages at morbidity onset tends to be larger than the variability in lifespans, and such divergence broadens over time. As longevity increases worldwide, the locus of health inequality is moving from death-related inequalities to disease- and disability-centered ones. Supplementary Information The online version contains supplementary material available at 10.1007/s10654-023-00989-3.


Appendix 1. Regional and country classification
We used the same regional and country classification as the Global Burden of Disease (GBD) Study 2019. 1 This comprises 204 countries and territories, 19 regions, and seven super-regions. Estimates at the global/world level are also provided.
The classification can be visualized at https://www.iapb.org/learn/vision-atlas/about/definitions-andregions/ (accessed on May 2, 2022) and is summarized in the tables below. These tables have the following variables/columns: • ISO3: Three-letter country codes (only for countries/territories) • location_id: Location number assigned by GBD • location_name: Name of the super-region, region, and country/territory • Type: Whether the location is a super-region or a region • Super-region: Super-region at which a country/territory belongs • Region: Region at which a country/territory belongs

Appendix 2. Reconstruction of mortality and morbidity curves
All the concepts and definitions relating to life tables presented in the following have already been introduced and widely discussed in the literature. For additional details, the reader is referred to two useful handbooks by Chiang 2 and Preston et al. 3

Mortality curves
The Global Burden of Disease (GBD) Study 2019 publishes life table data for all countries and a great diversity of regions from 1950 to 2019. 4 Specifically, it reports data on age-specific probabilities of dying and age-specific remaining life expectancy that can be used to reconstruct life tables for all regionand country-years.
GBD reports estimates in 5-year age groups, except for the first two groups of length 1 and 4, respectively. Let = {0, 1, 5, 10, … , 105, 110} be the set of starting ages of all the age groups reported by GBD. Let denote the unconditional probability of dying between ages and + , where ∈ and = 5, except for the first two age groups in which = 1 and = 4, respectively. Then, the survival probability from birth to exact age is with the convention that the probability of surviving from birth to age 0 is ℓ 0 = 1 (also known as the radix of the life table). The survival curve defined in (A1) is what we refer to as 'mortality curve' and can be used to calculate life expectancy (figure 1 panel A in the main manuscript). From (A1), we can derive the distribution of ages at death, given by for all ∈ \{110}. For the last age group 110+, = ℓ to close the life table. Note that denotes the proportion of deaths between ages and + , and that ∑ ∈ = 1. The distribution of ages at death defined in (A2) can be used to measure the variability in ages at death and calculate lifespan inequality (LI) indicators (figure 1 panel C in the main manuscript).

Morbidity curves
In recent years, GBD has been publishing estimates of age-and sex-specific health-adjusted life expectancy (HALE) for all countries from 1990 to 2019. 1,5 These estimates are obtained using models that incorporate data of years lived with disability, life tables, and standard demographic methods, 1 but the underlying 'life tables in good health' remain unknown. However, if one has age-specific mortality data from a given population and makes mild assumptions of the average person-years lived in each age interval by individuals dying in that interval (the values), 3 it is possible to reconstruct the full life table. To estimate the values we use the following result.

Proposition. Let and + be the remaining life expectancies at ages and + , respectively. Let be the unconditional probability of dying between ages and + . Then, the average personyears lived between ages and + by individuals dying in that age interval is given by
Proof. Equation (A3) is an immediate result of the properties and relationships of the different columns of the life table. Following (A1), the probability of surviving from birth to age + can be estimated as provided that (1 − ) is the probability of surviving from to + . Let and + be the person-years lived above ages and + , respectively. By definition, the remaining life expectancies at ages and + are

Corollary.
Let and + be the remaining life expectancies at ages and + , respectively. Let be the average person-years lived between ages and + by individuals dying in that age interval. Then, re-arranging terms in (A3), the age-specific unconditional probability of death can be estimated as is the key relationship that enables building a full life table using data on age-specific remaining life expectancy only, assuming data are available for all ages and that the values are provided or can be reasonably estimated.
Using data on age-specific probabilities of dying ( ) and age-specific remaining life expectancy ( ) from GBD, 4 for each country/region, sex, and year we calculated the values by applying (A3). Next, we used the corresponding values from each life table, in combination with the age-specific HALE estimates, 5 to reconstruct morbidity curves and the age-at-morbidity onset distributions.
Formally, let * denote the remaining health-adjusted life expectancy (HALE) at age , and * = {0, 1, 5, 10, … , 90, 95} the set of starting ages of all the age groups for which GBD reports HALE data. 5 In the following we add an asterisk as superscript to denote all the terms that relate to morbidity instead of mortality. Using (A7), the age-specific probabilities of health loss can be estimated as * = + * for all ∈ * \{95}. For the last age group 95+ we assume * = 1. Once these probabilities are estimated, applying analogous formulas to (A1) and (A2), we reconstructed the survival curves in good health or morbidity curves ( figure 1 panel B in the main manuscript), and the age-at-morbidity onset distributions * = ℓ * − ℓ + * (A10) As previously, ℓ 0 * = 1 and for the last age group 95+ * = ℓ * . The * values denote the proportion of individuals ceasing to be in good health between ages and + , and can be used to measure the variability of ages-at-morbidity onset and calculate healthy lifespan inequality (HLI) indicators (figure 1 panel D in the main manuscript).

Adjustments of the values
Equation (A8) combines the age-specific HALE estimates from GBD ( * ), 5 and the average personyears lived in each age interval by individuals dying in that interval ( ) obtained by applying (A3) to GBD mortality data. 4 This equation works as long as two conditions are met for all ∈ * \{95}: 1. + * + > * and + * + > , so that the probabilities are positive; and 2. * > to ensure that the denominator is larger and * ∈ (0,1).
Condition 1 is always met, but condition 2 is not, particularly at older ages. This is, in part, because GBD mortality estimates go up to age 110+, whereas HALE estimates end at 95+. In 5-year age groups, values hover around 2.5 and start decreasing, approximately, at ages above 75 years. However, it may happen that at ages 75 to 95 years the remaining health-adjusted life expectancy is considerably lower than 2.5, therefore * < and condition 2 is not met. See, for instance, the case of Chinese males in 1990: at age 90, applying (A3) we get 90 5 = 1.68, but GBD reports 90 * = 1.62. In these situations, the values needed to be adjusted before applying (A8) to avoid * > 1.
From the 20,790 morbidity life tables (204 countries and territories, 19 regions, 7 super-regions, global level, 3 sex groups, and 30 years) reconstructed, in 18,038 (86.8%) of them * > for all age groups and condition 2 was always met. Inconsistencies were detected in 2,752 (13.2%) of the cases, among which 1) In 2,050 (9.9%) of the cases the issue was solved by imputing b) Next, we applied these ratios to calculate from * whenever inconsistencies were detected and ensure that * > . The estimated ratio of the 105−110 age group was applied to the second-last group (90−95) of the morbidity life tables, and so on.

The standard deviation as a measure of inequality
Using life table notation, the standard deviation of an age-at-death distribution beginning at age is defined as where is the age at which the age-at-death distribution starts (in this paper we only report for = 0 and = 65), ℓ is the initial life table population at age , is the remaining life expectancy at age , and and are, respectively, the proportion of deaths and the average-person years lived in the interval by those dying in the interval (or average age at death in the interval) between and + . This is a very popular and basic indicator that measures the variability in the ages at death around the mean of the distribution, and that we adopt as a lifespan inequality (LI) indicator.
The same formula can be applied to an age-at-morbidity onset distribution (whose derivation is explained in Appendix 2) to calculate the corresponding level of healthy lifespan inequality (HLI). Thus, the standard deviation of the age-at-morbidity onset distribution beginning at age is calculated as * = √ 1 ℓ * ∑ * ( + − − * ) 2 ∈ * , ≥ where ℓ * is the initial life table population in good health, * is the health-adjusted remaining life expectancy (HALE) at age , and * is the proportion of individuals ceasing to be in good health between ages and + . The HLI indicator * measures the variability in individuals' healthy lifespans.

Other inequality measures
We compared our lifespan inequality (LI) and healthy lifespan inequality (HLI) measures based on the standard deviation with those derived from using the coefficient of variation ( ) and the Gini coefficient ( ) as inequality measures by applying the following formulas: The coefficient of variation is simply the relative version of the standard deviation, while the Gini coefficient is a very popular index of inequality that measures the expected difference between two randomly chosen observations.
As shown in Figs. S1 and S2, all measures of HLI are highly correlated ( > 0.91), both for females and males.

Uncertainty estimation
We assessed the uncertainty of the LI and HLI estimates based on the uncertainty of the input data from GBD. Uncertainty was obtained by sampling from the corresponding 95% uncertainty intervals of life expectancy ( ), death probabilities ( ), and HALE ( * ) reported by GBD on Monte Carlo simulations, applying a similar approach than used elsewhere. 6 We assumed GBD data are normally distributed with mean values equal to the point estimates. For each country/region, sex, and year we proceeded as follows: 1) For each age-specific estimate ( , , and * ), we approximated the standard deviation by dividing the range of the corresponding 95% uncertainty interval by 3.92. 2) We randomly drew 10,000 samples of age-specific , , and * from a truncated normal distribution with means equal to the point estimates and corresponding standard deviations: a) For and * we set a lower bound of 0 to only have positive values, with no upper limit. b) For the distribution was truncated between 0 and 1.
3) For each draw of age-specific and , we applied (A1) an (A2) to calculate 10,000 sets of survival curves (ℓ ) and age-at-death distributions ( ). 4) From the sets of , we applied (A11) to obtain 10,000 estimates of at ages = 0 and = 65, from which we calculated the corresponding 80% uncertainty intervals of the LI levels reported in the paper. 5) Besides, for each draw of age-specific * , we applied (A8) to calculate 10,000 sets of age-specific probabilities of health loss ( * ). a) We did not incorporate any uncertainty on the values, as this generated too much noise. We used the point estimates of calculated as described in Appendices 2.2 and 2.3. b) When applying (A8) to random draws of * we faced a new challenge: Condition 1 in Appendix 2.3 was not always met. Due to randomness, it may happen that + * + < * or + * + < . As a result, * could take any values and were not only restricted to the interval (0,1). c) To address this issue, we re-sampled * from a log-normal distribution.
i) For each sample of age-specific * obtained by applying (A8) we calculated its mean and its standard deviation . ii) We then used standard formulae to obtain the usual log-normal parameters mean and variance 2 , given by = log ( 2 √ 2 + 2 ) and 2 = log (1 + 2 2 ) iii) By drawing random values from a truncated log-normal distribution with upper bound set at 1 and these and 2 parameters we ensured * ∈ (0,1). 6) Next, we applied (A9) and (A10) to calculate 10,000 sets of morbidity curves (ℓ * ) and age-atmorbidity onset distributions ( * ). 7) From the sets of * , we applied (A12) to obtain 10,000 estimates of * at ages = 0 and = 65, from which we calculated the corresponding 80% uncertainty intervals of the HLI levels reported in the paper.

Transparency and replicability
We carried out our analyses using the open-source statistical software R (version 4.1.1). 7 The source code to replicate the analyses, input data, and results are publicly available for research purposes on the GitHub repository https://github.com/panchoVG/HLI. Global, regional, and national estimates of the ratio between healthy lifespan inequality and lifespan inequality (HLI/LI) for the period 1990−2019, and the corresponding 80% uncertainty intervals, are available on the GitHub repository https://github.com/panchoVG/HLI (file 'RatiosHLI-LI-1990-2019.csv'). Figure S3. Global and regional trends in the ratio between healthy lifespan inequality (HLI) and lifespan inequality (LI) by sex, 1990−2019. Shadowed areas represent 80% uncertainty intervals. Source: Authors' elaboration based on GBD data. 4,5